Kohei MIYASE Ryota SAKAI Xiaoqing WEN Masao ASO Hiroshi FURUKAWA Yuta YAMATO Seiji KAJIHARA
Test power has become a critical issue, especially for low-power devices with deeply optimized functional power profiles. Particularly, excessive capture power in at-speed scan testing may cause timing failures that result in test-induced yield loss. This has made capture-safety checking mandatory for test vectors. However, previous capture-safety checking metrics suffer from inadequate accuracy since they ignore the time relations among different transitions caused by a test vector in a circuit. This paper presents a novel metric called the Transition-Time-Relation-based (TTR) metric which takes transition time relations into consideration in capture-safety checking. Detailed analysis done on an industrial circuit has demonstrated the advantages of the TTR metric. Capture-safety checking with the TTR metric greatly improves the accuracy of test vector sign-off and low-capture-power test generation.
Senling WANG Yasuo SATO Seiji KAJIHARA Kohei MIYASE
In this paper we propose a novel method to reduce power consumption during scan testing caused by test responses at scan-out operation for logic BIST. The proposed method overwrites some flip-flops (FFs) values before starting scan-shift so as to reduce the switching activity at scan-out operation. In order to relax the fault coverage loss caused by filling new FF values before observing the capture values at the FFs, the method employs multi-cycle scan test with partial observation. For deriving larger scan-out power reduction with less fault coverage loss and preventing hardware overhead increase, the FFs to be filled are selected in a predetermined ratio. For overwriting values, we prepare three value filling methods so as to achieve larger scan-out power reduction. Experiment for ITC99 benchmark circuits shows the effectiveness of the methods. Nearly 51% reduction of scan-out power and 57% reduction of peak scan-out power are achieved with little fault coverage loss for 20% FFs selection, while hardware overhead is little that only 0.05%.
XiaoBo JIANG DeSheng YE HongYuan LI WenTao WU XiangMin XU
We propose an asynchronous datapath for the low-density parity-check decoder to decrease power consumption. Glitches and redundant computations are decreased by the asynchronous design. Taking advantage of the statistical characteristics of the input data, we develop novel key arithmetic elements in the datapath to reduce redundant computations. Two other types of datapaths, including normal synchronous design and clock-gating design, are implemented for comparisons with the proposed design. The three designs use similar architectures and realize the same function by using the 0.18µm process of the Semiconductor Manufacturing International Corporation. Post-layout result shows that the proposed asynchronous design exhibits the lowest power consumption. The proposed asynchronous design saves 48.7% and 21.9% more power than the normal synchronous and clock-gating designs, respectively. The performance of the proposed datapath is slightly worse than the clock-gating design but is better than the synchronous design. The proposed design is approximately 7% larger than the other two designs.
Song JIA Li LIU Xiayu LI Fengfeng WU Yuan WANG Ganggang ZHANG
Information security has been seriously threatened by the differential power analysis (DPA). Delay-based dual-rail precharge logic (DDPL) is an effective solution to resist these attacks. However, conventional DDPL convertors have some shortcomings. In this paper, we propose improved convertor pairs based on dynamic logic and a sense amplifier (SA). Compared with the reference CMOS-to-DDPL convertor, our scheme could save 69% power consumption. As to the comparison of DDPL-to-CMOS convertor, the speed and power performances could be improved by 39% and 54%, respectively.
Tanvir AHMED Jun YAO Yuko HARA-AZUMI Shigeru YAMASHITA Yasuhiko NAKASHIMA
Nowadays, fault tolerance has been playing a progressively important role in covering increasing soft/hard error rates in electronic devices that accompany the advances of process technologies. Research shows that wear-out faults have a gradual onset, starting with a timing fault and then eventually leading to a permanent fault. Error detection is thus a required function to maintain execution correctness. Currently, however, many highly dependable methods to cover permanent faults are commonly over-designed by using very frequent checking, due to lack of awareness of the fault possibility in circuits used for the pending executions. In this research, to address the over-checking problem, we introduce a metric for permanent defects, as operation defective probability (ODP), to quantitatively instruct the check operations being placed only at critical positions. By using this selective checking approach, we can achieve a near-100% dependability by having about 53% less check operations, as compared to the ideal reliable method, which performs exhaustive checks to guarantee a zero-error propagation. By this means, we are able to reduce 21.7% power consumption by avoiding the non-critical checking inside the over-designed approach.
Hisashi IWAMOTO Yuji YANO Yasuto KURODA Koji YAMAMOTO Kazunari INOUE Ikuo OKA
Ternary content addressable memory (TCAM) is popular LSI for use in high-throughput forwarding engines on routers. However, the unique structure applied in TCAM consume huge amounts of power, therefore it restricts the ability to handle large lookup table capacity in IP routers. In this paper, we propose a commodity-memory based hardware architecture for the forwarding information base (FIB) application that solves the substantial problems of power and density. The proposed architecture is examined by a fabricated test chip with 40 nm embedded DRAM (eDRAM) technology, and the effect of power reduction verified is greatly lower than conventional TCAM based and the energy metric achieve 0.01 fJ/bit/search. The power consumption is almost 0.5 W at 250 Msps and 8M entries.
Teerachot SIRIBURANON Takahiro SATO Ahmed MUSA Wei DENG Kenichi OKADA Akira MATSUZAWA
This paper presents a 20 GHz push-push VCO realized by a 10 GHz super-harmonic coupled quadrature oscillator for a quadrature 60 GHz frequency synthesizer. The output nodes are peaked by a tunable second harmonic resonator. The proposed VCO is implemented in 65 nm CMOS process. It achieves a tuning range of 3.5 GHz from 16.1 GHz to 19.6 GHz with a phase noise of -106 dBc/Hz at 1 MHz offset. The power consumption of the core oscillators is 10.3 mW and an FoM of -181.3 dBc/Hz is achieved.
Hao ZHANG Mengshu HUANG Yimeng ZHANG Tsutomu YOSHIHARA
This paper proposes a novel approach for implementing an ultra-low-power voltage reference using the structure of self-cascode MOSFET, operating in the subthreshold region with a self-biased body effect. The difference between the two gate-source voltages in the structure enables the voltage reference circuit to produce a low output voltage below the threshold voltage. The circuit is designed with only MOSFETs and fabricated in standard 0.18-µm CMOS technology. Measurements show that the reference voltage is about 107.5 mV, and the temperature coefficient is about 40 ppm/, at a range from -20 to 80. The voltage line sensitivity is 0.017%/V. The minimum supply voltage is 0.85 V, and the supply current is approximately 24 nA at 80. The occupied chip area is around 0.028 mm2.
Muchen LI Jinjia ZHOU Dajiang ZHOU Xiao PENG Satoshi GOTO
As the successive video compression standard of H.264/AVC, High Efficiency Video Codec (HEVC) will play an important role in video coding area. In the deblocking filter part, HEVC inherits the basic property of H.264/AVC and gives some new features. Based on this variation, this paper introduces a novel dual-mode deblocking filter architecture which could support both of the HEVC and H.264/AVC standards. For HEVC standard, the proposed symmetric unified-cross unit (SUCU) based filtering scheme greatly reduces the design complexity. As a result, processing a 1616 block needs 24 clock cycles. For H.264/AVC standard, it takes 48 clock cycles for a 1616 macro-block (MB). In synthesis result, the proposed architecture occupies 41.6k equivalent gate count at frequency of 200 MHz in SMIC 65 nm library, which could satisfy the throughput requirement of super hi-vision (SHV) on 60 fps. With filter reusing scheme, the universal design for the two standards saves 30% gate counts than the dedicated ones in filter part. In addition, the total power consumption could be reduced by 57.2% with skipping mode when the edges need not be filtered.
Ryota SEKIMOTO Akira SHIKATA Kentaro YOSHIOKA Tadahiro KURODA Hiroki ISHIKURO
An ultra low power and low voltage successive-approximation-register (SAR) analog-to-digital converter (ADC) with timing optimized asynchronous clock generator is presented. By calibrating the delay amount of the clock generator, the DAC settling waiting time is adaptively optimized to counter the device mismatch. This technique improved the maximum sampling frequency by 40% keeping ENOB around 7-bit at 0.4 V analog and 0.7 V digital power supply voltage. The delay time dependency on power supply has small effect to the accuracy of conversion. Decreasing of supply voltage by 9% degrades ENOB only by 0.1-bit, and the proposed calibration can give delay margins for high voltage swing. The prototype ADC fabricated in 40 nm CMOS process achieved figure of merit (FoM) of 8.75-fJ/conversion-step with 2.048 MS/s at 0.6 V analog and 0.7 V digital power supply voltage. The ADC can operates from 50 S/s to 8 MS/s keeping ENOB over 7.5-bit.
Hyuk-Jun LEE Seung-Chul KIM Eui-Young CHUNG
A packet memory stores packets in internet routers and it requires typically RTTC for the buffer space, e.g. several GBytes, where RTT is an average round-trip time of a TCP flow and C is the bandwidth of the router's output link. It is implemented with DRAM parts which are accessed in parallel to achieve required bandwidth. They consume significant power in a router whose scalability is heavily limited by power and heat problems. Previous work shows the packet memory size can be reduced to , where N is the number of long-lived TCP flows. In this paper, we propose a novel packet memory architecture which splits the packet memory into on-chip and off-chip packet memories. We also propose a low-power packet mapping method for this architecture by estimating the latency of packets and mapping packets with small latencies to the on-chip memory. The experimental results show that our proposed architecture and mapping method reduce the dynamic power consumption of the off-chip memory by as much as 94.1% with only 50% of the packet buffer size suggested by the previous work in realistic scenarios.
Meng XU Xincun JI Jianhui WU Meng ZHANG
This paper presents a low-power LDPC decoder that can be used in Multimedia Wireless Sensor Networks. Three low power design techniques are proposed in the decoder design: a layered decoding algorithm, a modified Benes network and a modified memory bypassing scheme. The proposed decoder is implemented in TSMC 0.13 µm, 1.2 V CMOS process. Experiments show that when the clock frequency is 32 MHz, the power consumption of the proposed decoder is 38.4 mW, the energy efficiency is 53.3 pJ/bit/ite and the core area is 1.8 mm2.
In order to reduce the dynamic energy dissipation in CMOS LSIs, it is effective to reduce the frequency of value changes of the signals. In this paper, a data expression with the valid digit and lower digit overflow information is proposed to suppress unnecessary signal changes in integer functional units and registers of general purpose processors. Experimental results show that the proposed method reduces the energy dissipation by 9.8% for benchmark programs.
Mitsuo NAKAMURA Mamoru UGAJIN Mitsuru HARADA
To reduce the power dissipation of the receiver in accordance with the intensity of the received signal, we developed the first intra-symbol intermittent (ISI) radio-frequency (RF) front end with 0.35-µm CMOS technology. In the demodulation mechanism, the RF output of the low-noise amplifier (LNA) is down-converted to an intermediate frequency (IF) by the mixer, and the LNA and mixer operate synchronously and intermittently within the length of a single symbol. Because the time-averaged power consumption is proportional to the operating time, the demodulation can be performed with low power by making the total operating time short. We experimentally demonstrate that demodulation (BPSK: 9.6 kbps) is properly achieved with the operating-time ratio of 12%. This ISI operation of the RF front end is enabled by a newly devised fast-transition LNA and mixer. A theoretical analysis of aliasing noise reveals that RF ISI operation is more useful than current-control with continuous operation and that an operating-time ratio of 10% is optimal.
Ce LI Yiping DONG Takahiro WATANABE
Dynamic power gating applicable to FPGA can reduce the power consumption effectively. In this paper, we propose a sophisticated routing architecture for a region oriented FPGA which supports dynamic power gating. This is the first routing solution of dynamic power gating for coarse-grained FPGA. This paper has 2 main contributions. First, it improves the routing resource graph and routing architecture to support special routing for a region oriented FPGA. Second, some routing channels are made wider to avoid congestion. Experimental result shows that 7.7% routing area can be reduced compared with the symmetric Wilton switch box in the region. Also, our proposed FPGA architecture with sophisticated P&R can reduce the power consumption of the system implemented in FPGA.
Haiqi WANG Sheqin DONG Tao LIN Song CHEN Satoshi GOTO
Dual-vdd has been proposed to optimize the power of circuits without violating the performance. In this paper, different from traditional methods which focus on making full use of slacks of non-critical gates, an efficient min-cut based voltage assignment algorithm concentrating on critical gates is proposed. And then this algorithm is integrated into a searching engine to auto-select rational voltages for dual-vdd system. Experimental results show that our search engine can always achieve good pair of dual-vdd, and our min-cut based algorithm outperformed previous works for voltage assignment both on power consumption and runtime.
Yukihiro SASAGAWA Jun YAO Takashi NAKADA Yasuhiko NAKASHIMA
Recently, the DVS (Dynamic Voltage Scaling) method has been aggressively applied to processors with Razor Flip-Flops. With Razor FF detecting setup errors, the supply voltage in these processors is down-scaled to a near critical setup timing level for a maximum power consumption reduction. However, the conventional Razor and DVS combinations cannot tolerate well error rate variations caused by IR-drops and environment changes. At the near critical setup timing point, even a small error rate change will result in sharp performance degradation. In this paper, we propose RazorProtector, a DVS application method based on a redundant data-path which uses a multi-cycle redundant calculation to shorten the recovery penalty after a setup error occurrence. A dynamic redundancy-adapting scheme is also given to use effectively the designed redundant data-path based on a study of the program, device and error rate characteristics. Our results show that RazorProtector with the adaptive redundancy architecture can, compared to the traditional DVS method with Razor FF, under a large setup rate caused by a 10% unwanted voltage drop, reduce EDP up to 78% at 100 µs/V, 88% at 200 µs/V voltage scaling slope.
Masashi KONO Akihiro KANBE Hidehiro TOYODA Shinji NISHIMURA
A novel 400-Gb/s (100-Gb/s4) physical-layer architecture for the next-generation Ethernet – using 100-Gb/s serial (optical single-wavelength) transmission – is proposed. As for the next-generation 400-Gb/s Ethernet, additional requirements from the market, such as power reduction and further miniaturization in addition to attaining even higher transmission speed, must be satisfied. To satisfy these requirements, a 100-Gb/s4 Ethernet physical-layer architecture is proposed. This architecture uses a 100-Gb/s serial (optical single-wavelength) transmission Ethernet and low-power technologies for a multi-lane transmission Ethernet. These technologies are implemented on a 100-Gb/s serial (optical single wavelength) transmission Ethernet using field-programmable gate arrays (FPGAs). Experimental evaluation of this implementation demonstrates the feasibility of low-power 400-Gb/s Ethernet.
Ki-Sung SOHN Da-In HAN Ki-Ju BAEK Nam-Soo KIM Yeong-Seuk KIM
A new clock gating circuit suitable for shift register is presented. The proposed clock gating circuit that consists of basic NOR gates is low power and small area. The power consumption of a 16-bit shift register implemented with the proposed clock gating circuit is about 66% lower than that found when using the conventional design.
Lechang LIU Takayasu SAKURAI Makoto TAKAMIYA
A 315 MHz power-gated ultra low power transceiver for wireless sensor network is developed in 40 nm CMOS. The developed transceiver features an injection-locked frequency multiplier for carrier generation and a power-gated low noise amplifier with current second-reuse technique for receiver front-end. The injection-locked frequency multiplier implements frequency multiplication by edge-combining and thereby achieves 11 µW power consumption at 315 MHz. The proposed low noise amplifier achieves the lowest power consumption of 8.4 µW with 7.9 dB noise figure and 20.5 dB gain in state-of-the-art designs.